Computing Discourse Information with Statistical Methods1~2

نویسنده

Kenneth EL Samuel

چکیده

This dissertation research involves implementing a computer system that, given a natural language dialogue, will automatically tag each utterance with a discourse label (a concise abstraction of the intentional function of the speaker) and a discourse pointer (a focusing mechanism that represents the dialogue context in which an utterance is to be understood). (Samuel 1996) Since the discourse label of an utterance is dependent on the surrounding dialogue, tagging utterances with discourse labels is similar to the part-of-speech (PoS) tagging problem in syntax. Within the domain of PoS tagging, extensive experimental research has shown that statistical learning algorithms are among the most successful. I will investigate two methods that have been effective in PoS tagging: Hidden Markov Models (HMMs) (Ch arniak 1993) and TransformationBased Learning (TBL) (Brill 1995). Unlike these PoS taggers, which determine a word’s tag based on the surrounding words (within a fixed window size), a discourse-tagging system must use the surrounding utterances as input. Thus, the sparse data problem is much more severe for the discourse tagger, since the number of possible utterances is infinite. To alleviate this problem, rather than directly processing each utterance verbatim (which would probably bombard the system with a great deal of extraneous information that is not relevant to the task at hand), I have identified a small set of features that can be extracted from each utterance to provide the relevant information to the learning algorithm. Since HMMs and TBL deal with contiguous sequences of discourse labels, they are unable to take focus shifts into consideration. But it is crucial to account for the focus shifts that frequently occur in discourse. I have proposed a solution to this problem for both algorithms. For HMMs, this involves modifying the Markov assumption slightly, while still retaining the linear-time efficiency of the HMMs approach. With TBL, the solution is more straightforward.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The impact of Cloud Computing in the banking industry resources

Today, one of the biggest problems that gripped the banking sphere, the high cost of implementing advanced technologies and the efficient use of the hardware. Cloud computing is the use of shared services on the Internet provides a large role in developing the banking system, without the need for operating expenses including staffing, equipment, hardware and software Reducing the cost of implem...

متن کامل

The impact of Cloud Computing in the banking industry resources

متن کامل

Kernel Based Discourse Relation Recognition with Temporal Ordering Information

Syntactic knowledge is important for discourse relation recognition. Yet only heuristically selected flat paths and 2-level production rules have been used to incorporate such information so far. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures di...

متن کامل

EFL Learners' Sensitivity to Linguistic and Discourse Factors in the Process of Anaphoric Resolution

The readers' ability to integrate current information with given information has been considered as an important component of reading comprehension process. One aspect of this integration process involves anaphoric resolution. The purpose of this study is to investigate the process of anaphoric resolution, focusing on inferential rigidity of different types of anaphoric ties. Ninety EFL learner...

متن کامل

A Statistical Model for Discourse Act Recognition in Dialogue Interactions

This paper discusses a statistical model for recognizing discourse intentions of utterances during dialogue interactions. We argue that this recognition process should be based on features of the current utterance as well as on discourse history, and show that taking into account utterance features such as speaker information and syntactic forms of utterances dramatically improves the system’s ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Computing Discourse Information with Statistical Methods1~2

نویسنده

چکیده

منابع مشابه

The impact of Cloud Computing in the banking industry resources

The impact of Cloud Computing in the banking industry resources

Kernel Based Discourse Relation Recognition with Temporal Ordering Information

EFL Learners' Sensitivity to Linguistic and Discourse Factors in the Process of Anaphoric Resolution

A Statistical Model for Discourse Act Recognition in Dialogue Interactions

عنوان ژورنال:

اشتراک گذاری